---
title: "Integrated ecosystem assessment on PCA (principal component analysis) for Black sea pelagic ecosystem, 1994-2017"
author: "Author: Piatinskii M., Azov-black sea branch of VNIRO"
date: 'Report build date: `r Sys.time()`'
output:
  html_document:
    toc: true
    toc_depth: 3
    toc_float: true
    theme: default
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

## 1. About
IEA - integrated ecosystem assessment is an approach that engages scientists, stakeholders, managers to integrate all components of an ecosystem into decision-making process. This approach provides the science necessary to carry out Ecosystem-Based management.

PCA - is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables (entities each of which takes on various numerical values) into a set of values of linearly uncorrelated variables called principal components.

PCA assessment done with `procom()` method. The calculation is done by a singular value decomposition of the (centered and scaled) data matrix, not by using eigen on the covariance matrix. This is generally the preferred method for numerical accuracy. The print method for these objects prints the results in a nice format and the plot method produces a scree plot.

Refs:

  - Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
  - Mardia, K. V., J. T. Kent, and J. M. Bibby (1979) Multivariate Analysis, London: Academic Press.
  - Venables, W. N. and B. D. Ripley (2002) Modern Applied Statistics with S, Springer-Verlag.
  - [Rodionov, 2004. A sequential algorithm for testing climate regime shifts](https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2004GL019448)

## 2. Input data
Here you can find input data to PCA analysis.

```{r echo=FALSE}
knitr::kable(data, digits = 3, caption = "Table 2.1. PCA input data")
```

## 3. Data diagnostics
There is preliminary data diagnostics procedure shown. Anomalies and traffic light plot should be reviewed to make primary data identification. 

### 3.1. Anomalies
This section shows the anomaly plots for all input data columns. Anomalies (a{x}) measurement calculated in simple way:

$$
\overline{x}={\frac {1}{n}}\sum _{i=1}^{n}x_{i}={\frac {x_{1}+x_{2}+\cdots +x_{n}}{n}} \\
a_{x}=x_{i}-\overline{x}
$$

where x - observations variants row. 

```{r echo=FALSE, fig.cap="Fig. 3.1.1. Sprat anomaly diagnostics"}
knitr::include_graphics("output/anomaly/sprat.png", dpi = 100)
```

```{r echo=FALSE, fig.cap="Fig. 3.1.2. Environment anomaly diagnostics"}
knitr::include_graphics("output/anomaly/env.png", dpi = 100)
```

```{r echo=FALSE, fig.cap="Fig. 3.1.3. Producers & consumers(1 rank)"}
knitr::include_graphics("output/anomaly/producers_consumers.png", dpi = 100)
```

```{r echo=FALSE, fig.cap="Fig. 3.1.4. Top Consumers (3 rank)"}
knitr::include_graphics("output/anomaly/predators.png", dpi = 100)
```

### 3.2. Traffic light
Traffic light plot show input data continious changes based on quantile deviatians. Quantile calculated for sorted data in next intervals: 

  - 0 ... 0.2 - shows high significant reduction in observation year by factor
  - 0.2 ... 0.4 - shows moderate reduction in observation year by factor
  - 0.4 ... 0.6 - shows no significant changes in observation year by factor
  - 0.6 ... 0.8 - shows moderate increase in observation year by factor
  - 0.8 ... 1 - show high significant increase in observation year by factor
  
```{r echo=FALSE, fig.cap = "3.2.1. Traffic light diagnostics plot"}
knitr::include_graphics("output/pca/col/traffic-light.png")
```

## 4. PCA analysis
Principle component analysis (PCA) done via `prcomp()` default method in R. The calculation is done by a singular value decomposition of the (centered and scaled) data matrix, not by using eigen on the covariance matrix. This is generally the preferred method for numerical accuracy. The print method for these objects prints the results in a nice format and the plot method produces a scree plot.

Unlike princomp, variances are computed with the usual divisor N - 1.

### 4.1. Component proportions
There is component st.dev, proportion of variance and cumulative proportion results shown.

```{r echo=FALSE}
knitr::kable(fit.sum, caption = "Table 4.1.1. Component proportions", digits = 2)
```

```{r echo=FALSE, fig.cap="Fig. 4.1.1. Component variance explained percentage"}
knitr::include_graphics("output/pca/col/variance-parts.png")
```

Main value is `Proportion of Variance` show the proportion of explained variance in all data. For example, PC1 proportion variance = 0.45 said that the 45% of data changes are explained by impact of PC1 component.

### 4.2. Component contributions
Here the single contribution of each factor are summarized for main components PC1 - PC3. This impact can be used to perform another assay based on PCA output.

```{r echo=FALSE}
knitr::kable(fit.impact[,-ncol(fit.impact)], caption = "Table 4.2.1. Component contribution", digits = 3)
```

```{r echo=FALSE, fig.cap="Fig. 4.2.1. Factor contribution to PC"}
knitr::include_graphics("output/pca/col/contrib-pc13.png")
```

```{r echo=FALSE, fig.cap="Fig. 4.2.2. Individual year contribution to PCA"}
knitr::include_graphics("output/pca/col/factor-contrib-year.png")
```

### 4.3. Weighted rotated matrix
In this section raw rotated standard deviation matrix to each one pc-dimension shown. Ofter this matrix used to explain component changes year-by-year and can be used to explaine regime shifts. 

```{r echo=FALSE}
knitr::kable(fit.score[,-ncol(fit.score)], caption = "Table 4.3.1. Stdev PC rotation matrix")
```

In human way this values in table can be interprited as a distance from the center (mean) of PC dimensions. High values leads to high distance from default dimension center. 

### 4.4. Main component biplot
There is a 2-dimensional (biplot) scientific graphs explaining PC1 changes over PC2 (PC1 - 1st dimension, PC2 - 2nd dimension in xy plot). Vectors with same directions show the positive correlation between them. Opositve vectors show negative corelated factors. 

```{r echo=FALSE, fig.cap="Fig. 4.4.1. Biplot PC1, PC2 component dimensions"}
knitr::include_graphics("output/pca/col/variable-pc1-vs-pc2.png")
```

```{r echo=FALSE, fig.cap="Fig. 4.4.2. Biplot PC1, PC2 with emperical year values"}
knitr::include_graphics("output/pca/col/biplot-year-variance.png")
```

```{r echo=FALSE, testgl, webgl=TRUE, fig.cap="Fig 4.4.3. 3D plot DIM 1-3 year variance"}
library("rgl")
#knitr::knit_hooks$set(webgl = hook_webgl)
fit.coord <- fit.score[,2:4]
fit.coord.offset <- fit.coord * 0.1
gr <- factor(nb$Best.partition*2)
fit.coord$col <- gr
plot3d(x = fit.coord[,1], 
       y = fit.coord[,2], 
       z = fit.coord[,3],
       xlab = "Dim 1",
       ylab = "Dim 2",
       zlab = "Dim 3",
       size = 10, 
       col = fit.coord$col)

lines3d(x = fit.coord[,1], y = fit.coord[,2], z = fit.coord[,3], col = fit.coord$col)
text3d(x = fit.coord[,1] + fit.coord.offset[,1], 
       y = fit.coord[,2] + fit.coord.offset[,2], 
       z = fit.coord[,3] + fit.coord.offset[,3], 
       fit.score$year, 
       col = fit.coord$col,
       cex = 0.7,
)
rglwidget()
```

## 5. Regime shift detection
Regime shift detection approach based on PCA results. Instant changes in PC stdevs show the group regime shifts. 

### 5.1. Silhouette cluster numbers
Detection of numbers of clusters are provided below. Silhouette methods used.
```{r echo=FALSE, fig.cap="Fig. 5.1.1. Optimal numbers of clusters by k-means method within Silhouette"}
knitr::include_graphics("output/pca/col/pca-kmeans-cluster-silhouette.png")
```

### 5.2. K-means clustering
Based on PCA coordinates, k-means clustering was performed with number of clusters by silhouette method.
```{r echo=FALSE, fig.cap="Fig. 5.2.1. K-means clustering: PC1 VS PC2"}
knitr::include_graphics("output/pca/col/pca-kmeans-pc-1-2.png")
```

```{r echo=FALSE, fig.cap="Fig. 5.2.2. K-means clustering: PC1 VS PC3"}
knitr::include_graphics("output/pca/col/pca-kmeans-pc-1-3.png")
```

```{r echo=FALSE, fig.cap="Fig. 5.2.1. K-means clustering: PC2 VS PC3"}
knitr::include_graphics("output/pca/col/pca-kmeans-pc-2-3.png")
```

### 5.3. Clustering PC's
In this section hierarchical cluster analysis for dissimilarities shown. Cluster analysis done by `hclust()` function with method "ward.D2". Ward2 method use squared Euclidian distance minimization between component variances to determine hierarchical relations.

```{r echo=FALSE, fig.cap="Fig. 5.3.1. Hierarchical cluster analysis for PC variations"}
knitr::include_graphics("output/pca/bw/pca-cluster-dendrogram.png")
```

### 5.4. Component score changes
```{r echo=FALSE, fig.cap="Fig. 5.4.1. PC1, PC2, PC3 stdev center distance changes"}
knitr::include_graphics("output/pca/col/pca-regime-shift.png")
```

Now, Rodionov regime-shift detection results. There is typical Rodeonov (2004) regime shift detection alorithm pefrormed. Mode: mean, p = 0.05, L = 10. Regime shift summary plot shown below. 

```{r echo=FALSE, fig.cap="Fig. 5.4.2. Rodionov regime-shift for PCA visualisation"}
knitr::include_graphics("output/pca/col/pca-regime-shift-radeonov.png")
```

```{r echo=FALSE, fig.cap="Fig 5.4.3. Rodionov regime-shift for PCA and data"}
knitr::include_graphics("output/pca/col/pca-regime-shift-data-radeonov.png")
```


## 6. Pearson correlation relationships
This section shows raw cross-correlation results for all data values. This analysis design is a typical for 1980-1990 years but not useful for present days. 

```{r echo=FALSE}
knitr::kable(cor.mat, digits = 2, caption = "Table 6.1. Pearson coefficient values (r) output")
```

In the figure below the summary short results shown. Only colored cells are significant and they correlation is true. 

```{r echo=FALSE, fig.cap="Fig. 6.1. Cross-correlation Pearson test on sign.level = 0.05"}
knitr::include_graphics("output/correlation-test-col.png")
```